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The understanding of complex systems has become a central issue because complex systems exist in a wide 
range of scientific disciplines. Time series are typical experimental results we have about complex systems. 
In the analysis of such time series, stationary situations have been extensively studied and correlations have 
been found to be a very powerful tool. Yet most natural processes are non-stationary. In particular, in times 
of crisis, accident or trouble, stationarity is lost. As examples we may think of financial markets, biological 
systems, reactors (both chemical and nuclear) or the weather. In non-stationary situations analysis becomes 
very difficult and noise is a severe problem. Following a natural urge to search for order in the system, we 
endeavor to define states through which systems pass and in which they remain for short times. Success in 
this respect would allow to get a better understanding of the system and might even lead to methods for 
controlling the system in more efficient ways. 

We here concentrate on financial markets because of the easy access we have to good data, because of our 
previous experience and last but not least because of the strong non-stationary effects recently seen. We 
analyze the S&P 500 stocks in the 19-year period 1992-2010. Here, we propose such an above mentioned 
definition of state for a financial market and use it to identify points of drastic change in the correlation 
structure. These points are mapped to occurrences of financial crises. We find that a wide variety of char- 
acteristic correlation structure patterns exist in the observation time window, and that these characteristic 
correlation structure patterns can be classified into several typical "market states". Using this classification 
we recognize transitions between different market states. A similarity measure we develop thus affords means 
of understanding changes in states and of recognizing developments not previously seen. 
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The effort to understand the dynamics in financial 
markets is attracting scientists from many fieldP^. Sta- 
tistical dependencies between stocks are of particular in- 
terest, because they play a major role in the estimation of 
financial risk^. Since the market itself is subject to con- 
tinuous change, the statistical dependencies also change 
in time. Thi s non -stationary behavior makes an analysis 
very 

difficult^SEU. Chan ges in supply and demand can 
even lead to a two phase behavior of the market^^. Here, 
we use the correlation matrix to identify and classify the 
market state. In particular we ask, how similar is the 
present market state, compared to previous states? To 
calculate this similarity we measure temporal changes in 
the statistical dependence between stock returns. 

For stationary systems described by a (generally large) 
number K of time series, the Pearson correlation coeffi- 
cient is extremely useful. It is defined as 

Here the and vj represent the time series of which the 
averages (...) are taken over a given time horizon T. 
and (Tj are their respective standard deviations. When 
calculating the correlation coefficients of K stocks, we 
obtain the K x K correlation matrix C, which gives an 
insight into the statistical interdependencies of the time 
series under study. 



It is necessary to consider data over large time horizons 
T so as to obtain reliable statistics. This leads to a funda- 
mental problem that arises in the case of non- stationary 
systems: To extract useful information from empirical 
data we seek a correlation matrix from very recent data, 
in order to provide a good description of current cor- 
relation structure. This is because correlations change 
dynamically due to the non-stationarity of the process, 
making it very difficult to estimate them precisel}ffiHi^. 
However, if the length T of the time series is short, the 
correlation matrices C are noisy. On the other hand, to 
keep the estimation error low, T can be increased, but 
this leads to a correlation matrix that generally does not 
describe the present state very well. Various noise reduc- 
tion techniques provide methods to conquer noisd^^^^. 

In several non-stationary systems, it is possible to ob- 
tain a large number of correlated data over time. Such 
systems include, but are not restricted to, financial mar- 
kets (which show non-stationary behavior due to crises), 
biological or medical time series (such as EEG), chemical 
and nuclear reactors (non-stationary behavior includes, 
in particular, accidents) or weather data. In the follow- 
ing, we only consider the financial markets, since we have 
studied extensively some very high quality data of this 
system, the non- stationary features of which have been 
quite striking in the last years. We propose a definition of 
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(a) daily data 
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(b) intraday data 

FIG. 1: Financial crisis are accompanied by drastic 
changes in the correlation structure, indicated by blue 

shaded areas. The market similarity ( in panel (a) is 
based on daily data. Panel (b) is a more detailed study 
of the 2007-2010 period, including the "credit crunch" 
and the initial impact of the european debt crisis. The 

area of panel (b) is a magnification of the lower right 
square in panel (a). 



a state which is appropriate for such systems and suggest 
a method of analysis which allows for a classification of 
possible behaviors of the system. When T/K < 1, which 
is the case we are interested in, the correlation matrix 



becomes singular. However, one can still make signifi- 
cant statistical statements, e.g., for the average correla- 
tion level whose estimation error decreases as 1/K. In 
the following, we focus on correlation matrices C(ti) and 
0(^2) at different times ti and ^2 measured over a short 
time horizon. These have therefore a pronounced random 
element. We take these objects as the fundamental states 
of our system. We now propose, as a central element, to 
introduce the following concept of distance between two 
states. We define 



C{h,t2) = {\Cij{h)-Cij{t2)\),, 



(2) 



to quantify the difference of the correlation structure for 
two points in time, where | . . . | denotes the absolute value 
and (. . denotes the average over all components. Note 
that in this case, the random component that is un- 
avoidable in the definition of the states of the system 
is strongly suppressed by the average over ^ 1 num- 
bers. 

To apply the above general statements to a specific ex- 
ample, we analyze two datasets: (i) we calculate C(^i7^2) 
based on the daily returns of those S&P 500 stocks that 
remained part of the S&P during the 19-year period 
1992-2010, and (ii) we study the four-year period 2007- 
2010 in more detail based on intraday data from the 
NYSE TAQ database. Since the noise increases for very 
high-frequency datsP^H^, we extract one-hour returns for 
dataset (ii). For one-hour returns, we consider this mar- 
ket microstructure noise as reasonably weak. 

However, sudden changes in drift and volatility are 
present on all time scales. They can result in erroneous 
correlation estimates. To address this problem, we em- 
ploy a local normalizatiorP^ of the return time series in 
dataset (i). The results of dataset (i) are presented in 
Fig. [Tal In this figure, each point is calculated on corre- 
lation matrices over the previous two months. This new 
representation gives a complete overview about struc- 
tural changes of this financial market of the past 19 years 
in a single figure. It allows to compare the similarity of 
the market states at different times. To make this pro- 
cedure concrete, consider the following example. Pick 
a point on the diagonal of Fig. [Ta| and designate it as 
"now". From this point the similarity to previous times 
can be found on the vertical line above this point, or the 
horizontal line to the left of this point. Light shading 
denotes similar market states and dark shading denotes 
dissimilar states. We can furthermore identify times of fi- 
nancial crises with dark shaded areas. This indicates that 
the correlation structure completely changes during a cri- 
sis. There are also similarities between crises, as between 
the "credit crunch" that induced the 2008-2009 financial 
crisis and the "market meltdown" , the burst of the dot- 
com bubble in 2002. A further example is the overall rise 
in correlation level in the beginning of 2007. This event 
can be niapped to drastic events on the Shanghai stock 
exchange^. 
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(i) Clustering tree 
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FIG. 2: The correlation between different industry branches as weh as the intra-branch correlation characterize the 
different market states (a-h). The inter-branch correlation is represented by the off-diagonal blocks, and the 
intra-branch correlation is represented by the blocks in the diagonal. Legend: E: Energy, M: Materials, I: 
Industrials, CD: Consumer Discretionary, CS: Consumer Staples, H: Health Care, F: Financials, IT: Information 
Technology, C: Communication, U: Utilities, (i) Similarity tree structure of the 8 market states, (j) Illustration of 

the overall average correlation matrix. 
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FIG. 3: Temporal evolution of the market state. The horizontal axis represents the observation time and the vertical 
axis denotes the market state obtained from top-down clustering. The market state sometimes remains in the same 

state for a long time, and sometimes for a short time in the same state. It also can return to a state that it has 
previously visited. Some states (e.g., state 1 and state 2) appear to cluster in time, while other states appear more 

sparsely and intermittently in time (e.g., state 4). 



Using dataset (ii) we are able to obtain a more detailed 



insight into recent market changes, as shown in Fig. lb 



This area is represented by the lower right square in 



Fig. la Using intraday data we calculate the correlation 
matrices on shorter time scales. We choose a time hori- 
zon of one week, which, because it provides insight into 
changes in the correlation structure on a much finer time 
scale, enables us to identify a short sub-period within 
the 2008-2009 crisis (in the beginning of 2009) during 
which the market temporarily stabilizes before it returns 
to the crisis state. While the correlation structure dur- 
ing the crisis displays an overall high correlation level, 
the correlation structure of the stable period is similar 
to the period before the crisis, one of the typical states 
in a calm period, which is identified from daily data in 
dataset (i). This phenomenon might be related to the 
market's reaction to news about the progress in rescuing 
the American International Group (A.I.G.)^^. The cor- 
relation structure of this stable period can be found in 
the Supplementary Material. 

The evolutionary structure presented in Figs. [Ta| and 
[Tbl illustrate that the correlation matrix sometimes main- 
tains its structure for a long time (bright regions), some- 
times changes abruptly (sharp blue stripes), and some- 
times returns to a structure resembling a structure the 
market has experienced before (white stripes). This sug- 
gests that the market might move among several typical 
market states. To extract such typical market states, 
we perform a clustering analysis in the results of dataset 
(i). From our clustering analysis (see Methods and Sup- 
plementary Material)^ we find that there are "hidden" 
states sparsely embedded in time, in addition to regimes 
that dominate the market during a continuous period 
and are easily found by eye. For this analysis, we use 
disjunct two- month time windows ending at the respec- 
tive dates. Because of the window length, some financial 
crashes cannot be resolved. Our aim is rather to identify 
the evolution of the market, which is, in some cases, in- 
duced by financial crisis. We can confirm in Fig. [2] that 
the typical states obtained from the clustering analysis 
indeed correspond to different characteristic correlation 
structures. To visualize these characteristic structures, 
we sort them according to their industry branch using 



the Global Industry Classification Standard (GICS]P. 
The industry branches correspond to the blocks on the 
diagonal. 

To visualize the characteristic structures of each state, 
we calculate its average correlation matrix and sort the 
companies according to their industry branch, as defined 
by the Global Industry Classification Standard (GIGS). 
The resulting matrices, the industry branches correspond 
to the blocks on the diagonal. The correlation between 
two branches are given by the off-diagonal blocks. The 
results are illustrated in Fig. |2] We can confirm that 
the typical states obtained from the clustering analysis 
indeed correspond to different characteristic correlation 
structures. 

Our analysis also offers insight into market structure 
dynamics. Figure [3] shows the temporal behavior of the 
market state. The market sometimes remains for a long 
time in the same state, and sometimes stays only for a 
short time. The typical duration depends upon the state: 
Some states (e.g., state 1 and state 2) appear in clusters 
in time while other states appear more sparsely in time 
(e.g., state 4). There seems to exist a global trend on a 
long time scale, although the market state is switching 
back and forth between states. 

In Fig. [2j we can see differences between the states in 
the correlation between branches as well as in the correla- 
tion within a branch. The correlation within the energy, 
information technology, and utilities branches is very 
strong in all states. State 1 shows an overall weak corre- 
lation, while states 3 and 4 feature in addition a strong 
correlation of the finance branch to other branches. State 
2 shows very unusual behavior: In the period of the dot- 
com bubble, many branches are anti-correlated with one 
another. In states 5, 6 and 7, the overall correlation level 
rises, although certain branches, such as energy, con- 
sumer staples, and utilities, are either strongly or weakly 
correlated with other branches. The energy branch (E) 
can be either strongly correlated to the rest of the market, 
weakly correlated, or even anti-correlated. Therefore we 
study the histogram of the correlation coefficients Cij (t) . 
We present the results in Fig. |4] In the months leading 
up to the credit crunch in October 2008, we observe a bi- 
modal structure in the histogram. It corresponds to the 
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0,0 0.5 

(b) Single histograms 



1,0 



FIG. 4: Footprint of the state transition in the 2008 

crisis by histograms of the correlation coefficients 
Cijit). (a) Surface plot for the time period September 
2007 to March 2009. We use a logarithmic scale to show 
the bimodal structure more clearly, (b) Histograms for 
September 2008 (black solid line) and December 2008 
(red dashed line). 



time period when the Energy branch shows a strong anti- 
correlation with other branches. The bimodality suggests 
that a subset of stocks - in this case, predominantly the 
Energy stocks - decouples from the rest of the market. 
During the crash, the histogram shows a very narrow 
distribution around large values of the correlation coeffi- 
cients, which corresponds to state 8 in Fig. |2j where the 
branch structure is lost almost completely in an overall 
strongly correlated market. 



CONCLUSION 

Our findings offer insight for constructing an "early 
warning system" for financial markets. By providing 



a simple instrument to identify similarities to previous 
states during an upcoming crisis, one can judge the cur- 
rent situation properly and be prepared to react if the 
crisis materializes. Another indication for a crisis is given 
when the correlation structure undergoes rapid changes. 

Using the similarity measure we were able to classify 
several typical market states between which the market 
jumps back and forth. Some of these states can easily be 
identified in the similarity measure. However, there are 
several states in which the market only stays for a short 
period. Thus, these states are sparsely embedded in time. 
With a clustering analysis, we were able to identify these 
states and disclose a detailed dynamics of the market's 
state. 

A possible application of the similarity measure is risk 
management. Given the similarity measure, the portfolio 
manager is aware of periods in which the market behaved 
completely differently and thus can choose not to include 
them in his calculations. He can furthermore identify 
regions in which the market behaved similarly and refer 
to these regions when estimating the correlation matrix. 

Our empirical study is a first step towards the identi- 
fication of states in financial markets which are a promi- 
nent example of complex non-stationary systems. 



METHODS 

A. Construction of stock returns 

Let S be the price of a specific stock and At the interval 
on which the return is calculated. For our study, we chose 
the arithmetic return, defined as 



r{t) 



^ S{t + At) - S{t) 

= W) 



(3) 



For dataset (i), we chose At to be 1 day and calculate 
the stock returns of each day. For dataset (ii), we chose 
At as 1 hour. Furthermore, we obtain this 1-hour return 
for every minute of a trading day between 10:45am and 
2:45pm. We obtain the daily data of dataset (i) from 
finance.yahoo.com. The intraday data is obtained from 
the New York Stock Exchange's TAQ database. 



B. Local normalization 

Sudden changes in drift and volatility can result in er- 
roneous correlation estimates. To address this problem, 
we employ a local normalization method^^. For each re- 
turn r{t) we subtract the local mean and divide by the 
local standard deviation. 



r{t) - {r{t))„ 



(4) 
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The local average (• • runs over the n most recent sam- 
pling points. For daily data, n = 13 yields nearly normal 
distributed time series, as recently discussed^. 



C. Outline of top-down clustering 

Our clustering analysis is based on a top-down scheme: 
All the correlation matrices are initially regarded as a 
single cluster and then divided into two clusters by the 
procedure based on the k-means algorithmP^M^. Each 
division step consists of the following process: 

1. Choose two initial cluster centers from all matrices. 
Label all other matrices by the more similar cluster 
center in terms of C^^^- 

(a) Recast two new cluster centers to the "center 
of mass" 

(b) Re-label all matrices to their most similar 
cluster center. 

(c) Repeat this process until there is no change in 
labeling. 

We stop this division process when the average dis- 
tance from each cluster center to its members becomes 
smaller than certain threshold. To identify the typical 
market states presented in the manuscript, we chose the 
threshold at 0.1465 as it represents the best ratio be- 
tween the distances between clusters and their intrin- 
sic radius. One can obtain finer structures by choosing 
smaller threshold values, ultimately until all the matri- 
ces are identified as different components. The complete 
results of the clustering analysis are available in the Sup- 
plementary Material 
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SUPPLEMENTARY MATERIAL 



Difference matrices to average correlation matrix 



Alternative measure: Difference of largest eigenvalue of 
correlation matrices 



A similar result can be archived using a different ap- 
proach. The largest eigenvalue Amax of the correlation 
matrix C describes the collective motion of all stocks. 
We can also define the similarity measure by the distance 
of these eigenvalues, 



Some of the correlation structures in Fig. [2] look quite 
similar at first sight. Their distinctiveness can be empha- 
sized by calculating the difference to the average corre- 
lation level. This is shown in Figj7| For example, state 
3 and 4 look very similar in Fig. [2[ However, Fig. [7| un- 
veils that the correlation within the Energy sector (E) is 
completely diverse. 



Ut{tut2) = |A^ax(C(ti)) - A^ax(C(t2))| 



(5) 



Figure [5] illustrates that this leads to an almost iden- 
tical result. The advantage of this technique is that the 
noise in the correlation matrix only contributes to small 
eigenvalue d^^ ' -^^l Thus, by only taking into account the 
largest one, we filter out the noise. However, this ap- 
proach also presumes that the corresponding eigenvector 
does not change. Our results indicate that the largest 
eigenvalue almost remains constant, but this might not 
always be the case. Especially during financial crises. 



Stable period within 2008-2009 crisis 

A detailed look of the correlation structure of the 2008- 
2009 crisis can be found in Fig.|6] While during the crisis, 
an overall high correlation level dominates, the structure 
stabilizes for a short time of 3 weeks. During this sta- 
ble period, the structure is very similar to state 7, that 
occurred just just before the crisis. 
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FIG. 5: Similarly matrix based on the difference 
correlation matrices largest eigenvalues. 
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(a) Crisis (2008/10/15 - 2009/4/1, excluding stable 
period) 



0.00 



CD 

CS 
H 



IT 



u 




0.60 
0.45 
0.30 
0.15 
0.00 
-0.15 



EMI CD cs H F IT CU 

(b) Stable period (2009/1/1 - 2009/1/21) 

FIG. 6: Within the 2008-2009 crisis, the market 
temporarily stabilizes. This stable state is very similar 
to the pre-crisis state that we identified from daily data 
(state 7). 




FIG. 7: (a)-(h): Difference of the states' correlation matrices to average correlation matrix (i). 



FIG. 8: The entire tree of the clustering analysis presented here for the threshold = 0: No termination of the 
division process takes place until all the correlation matrices are identified as different components. The large bold 
numbers represent the market states each of which consists of the matrices in the sub-trees below. Each right end of 
the tree corresponds to each 2-month term (year-term). Terms 1, 2, . . . , 6 correspond to January to February, 
March to April, . . . , November to December, respectively. The length of each branch represents the distance from 
the center of the subcluster to the center of the original cluster before the last dual division. 



